Remove the type `ParamSpaceSGD` #205

Red-Portal · 2025-09-15T16:06:47Z

This PR removes the use of the type ParamSpaceSGD, which provides a unifying implementation of VI algorithms that run SGD in parameter space. Instead, each parameter space SGD-based VI algorithm becomes its own AbstractVariationalAlgorithm, where the shared code implementing step is shared by dispatching over their Union.

This addresses #204

Red-Portal · 2025-09-15T16:08:18Z

Hi @yebai , could you check to make sure that this is what you asked for? Personally, I feel the ParamSpaceSGD-based interface is much cleaner and intuitive, especially in terms of project structure. So I still insist we keep it as an internal implementation detail.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

yebai · 2025-09-15T21:41:32Z

src/algorithms/paramspacesgd/paramspacesgd.jl

+    elseif alg isa KLMinScoreGradDescent
+        return KLMinScoreGradDescentState(prob, q_init, 0, grad_buf, opt_st, obj_st, avg_st)
+    else
+        nothing


Maybe throw a warning or error message here instead of letting it fail silently?

Suggested change

nothing

nothing

It should never hit the else condition, so let me use InvalidStateException.

yebai · 2025-09-15T21:41:56Z

src/algorithms/paramspacesgd/paramspacesgd.jl

+            prob, re(params), iteration, grad_buf, opt_st, obj_st, avg_st
+        )
+    else
+        nothing


Same as above.

Suggested change

nothing

nothing

yebai · 2025-09-15T21:45:21Z

src/algorithms/paramspacesgd/paramspacesgd.jl

-    obj_st::ObjSt
-    avg_st::AvgSt
-end
+const ParamSpaceSGD = Union{


Suggested change

const ParamSpaceSGD = Union{

"""

This family of algorithms (`<:KLMinRepGradDescent`,`<:KLMinRepGradProxDescent`,`<:KLMinScoreGradDescent`) applies stochastic gradient descent (SGD) to the variational `objective` over the (Euclidean) space of variational parameters.

The trainable parameters in the variational approximation are expected to be extractable through `Optimisers.destructure`.

This requires the variational approximation to be marked as a functor through `Functors.@functor`.

"""

const ParamSpaceSGD = Union{

yebai

Thanks @Red-Portal -- I left some comments above. In addition, let's simplify the folder structure a bit for clarity:

move all files in paramsspacesgd to algorithms, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl"
keep each algorithm in its own file

Also, I'd suggest we consider renaming paramspacesgd.jl to interface.jl or something along the lines:

"algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"

Red-Portal · 2025-09-15T22:46:40Z

Hi Hong, I planned to do the restructuring in a separate PR to keep things simple in this one. Though:

move all files in paramsspacesgd to algorithms, eg, "algorithms/paramspacesgd/constructors.jl" to "algorithms/constructors.jl"

"algorithms/paramspacesgd/paramspacesgd.jl" to "algorithms/interface.jl"

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism so I think these namings are not gonna withstand that change. In fact, part of the reason I kept everything under paramspacesgd/ was precisely for this reason.

yebai · 2025-09-15T23:01:29Z

It is okay to keep all algorithms under algorithms and remove the extra subfolder paramspacesgd:

These categorisations are nonstandard, so they are not helping clarity.
There aren't many algorithms, so it is okay to keep all of them under the same algorithms folder.

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism

You can add more algorithms to interface.jl, so long as these algorithms are clearly grouped in interface.jl.

Red-Portal · 2025-09-15T23:13:03Z

After the release of v0.5, we'll be adding algorithms that don't conform to the original ParamSpaceSGD formalism

You can add more algorithms to interface.jl, so long as these algorithms are clearly grouped in interface.jl.

I am saying that these new algorithms can't be grouped in interface.jl since they will need their custom implementation of step. The current algorithms all go through the same step, which is why they can be grouped.

yebai · 2025-09-16T07:12:33Z

"grouping" refers to grouping interface code together for similar VI algorithms in the proposed interface.jl. It doesn't require creating a new union type if an algorithm is distinct from others. In these cases, the algorithm could be a singleton group.

sunxd3 · 2025-09-16T09:29:48Z

If I understand right, this PR flattens the existing AbstractVariationalAlgorithm → ParamSpaceSGD → (concrete algorithms) hierarchy. That intermediate layer exists today so that anything doing “SGD in parameter space” can share one abstraction.

Before we drop it, may I understand what concrete benefits the flattening delivers? In particular, are we planning to add other algorithm families alongside the current ParamSpaceSGD? If so, we then probably need to add the middle layer abstraction back.

At the moment, I haven't quite convince myself that the flattening of the type hierarchy is necessary.

yebai · 2025-09-16T09:38:54Z

I suggested the removal of the ParamSpaceSGD because it is not standard terminology in the variational inference literature. Adding this less well-known terminology adds mental overhead to understanding the code.

Red-Portal · 2025-09-16T16:32:50Z

@yebai @sunxd3 Thanks for chiming in. Actually, I have a new idea. So I believe the main complaint at the moment is that the term ParamSpaceSGD is not intuitive as an abstraction. What if we fix that directly: by changing the name to something more obvious.

In a nutshell, the nice thing about the current ParamSpaceSGD interface is that you only need to define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm. So let me make that more explicit in the name. Here are a few candidates:

ObjectiveSGD
ObjectiveDescentInducedAlgorithm
MinObjectiveSGD
MinObjectiveWithSGD
MinObjectiveBySGD
ObjectiveGradientEstimateDescent

or something along these lines? Would that resolve your concern?

yebai · 2025-09-16T22:13:05Z

I think I see your point. But, I am not sure that helps.

define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm.

That probably includes every learning algorithm in ML.

Red-Portal · 2025-09-17T01:11:27Z

define a gradient estimator (estimate_gradient) of a corresponding objective to form an algorithm.

That probably includes every learning algorithm in ML.

Yes, that is indeed almost true! But the point is that there are a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too. So this is the reason I wish to allow for two different abstraction levels. But as you said, most algorithms only require defining a gradient estimator. So the lower-level interface helps unify the code for all those algorithms.

yebai · 2025-09-17T08:58:46Z

a couple of important algorithms that don't quite conform to this formalism, as they result in a custom update rule. They don't fall out of a gradient estimator, but modify the parameter update step too.

We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, step) across algorithms.

There might be some insights we can learn by taking a unifying view of parameter space gradient descent VI, but that is a discussion we should have offline for a review paper.

Red-Portal · 2025-09-17T16:50:56Z

We are at risk of premature abstraction and introducing heuristic terminology here. It is better to work with concrete algorithms, and define a union type if sharing code is needed (eg, step) across algorithms.

My main beef with using Unions here is the following:

This results in duplicating code, as can already be seen in the PR (the structs KLMinRepGradDescent, KLMinRepGradProxDescent, KLMinScoreGradDescent all have the same fields.)
To use the shared step function, a certain <:AbstractVariationalAlgorithm struct has to contain a certain list of fields, which is now implicit since we're not using a proper interface.
I am not sure whether I should document the use of the shared step function. But if I do document it, it's not going to be pretty since it assumes a whole lot of implicit things (as stated in the item above).
The structuring of the project will need a bit of discussion since things are a bit more complicated. (What should we call the file containing the shared step function? How should we structure the directories?) As mentioned in a previous comment, we can't just use generic names since I will be adding algorithms that don't use the shared step function in the following PRs. Also, people first looking at the code base will probably have to do some mental gymnastics since lots of things are made implicit but not explicit through some interface.

With that said, do you find the solution below still unsatisfying? At least I hope that this resolves your concern that the terminology is non-standard.

So let me make that more explicit in the name. Here are a few candidates:

ObjectiveSGD

ObjectiveDescentInducedAlgorithm

MinObjectiveSGD

MinObjectiveWithSGD

MinObjectiveBySGD

ObjectiveGradientEstimateDescent
or something along these lines? Would that resolve your concern?

If you think we should still go with an implicit interface, then I'll follow for the sake of moving forward.

remove the type ParamSpaceSGD

e332d8c

Red-Portal and others added 3 commits September 15, 2025 12:08

run formatter

1f35cc9

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

run formatter

c8404b6

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

run formatter

0cc7538

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

yebai requested changes Sep 15, 2025

View reviewed changes

yebai reviewed Sep 15, 2025

View reviewed changes

yebai requested a review from sunxd3 September 16, 2025 07:13

-const ParamSpaceSGD = Union{
+"""
+This family of algorithms (`<:KLMinRepGradDescent`,`<:KLMinRepGradProxDescent`,`<:KLMinScoreGradDescent`) applies stochastic gradient descent (SGD) to the variational `objective` over the (Euclidean) space of variational parameters.
+The trainable parameters in the variational approximation are expected to be extractable through `Optimisers.destructure`.
+This requires the variational approximation to be marked as a functor through `Functors.@functor`.
+"""
+const ParamSpaceSGD = Union{

Remove the type ParamSpaceSGD #205

Are you sure you want to change the base?

Remove the type ParamSpaceSGD #205

Conversation

Red-Portal commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Red-Portal commented Sep 15, 2025

Uh oh!

yebai Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Red-Portal Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yebai Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

yebai Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

yebai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Red-Portal commented Sep 15, 2025

Uh oh!

yebai commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Red-Portal commented Sep 15, 2025

Uh oh!

yebai commented Sep 16, 2025

Uh oh!

sunxd3 commented Sep 16, 2025

Uh oh!

yebai commented Sep 16, 2025

Uh oh!

Red-Portal commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yebai commented Sep 16, 2025

Uh oh!

Red-Portal commented Sep 17, 2025

Uh oh!

yebai commented Sep 17, 2025

Uh oh!

Red-Portal commented Sep 17, 2025

Uh oh!

Uh oh!

Remove the type `ParamSpaceSGD` #205

Remove the type `ParamSpaceSGD` #205

Red-Portal commented Sep 15, 2025 •

edited

Loading

Red-Portal Sep 15, 2025 •

edited

Loading

yebai left a comment •

edited

Loading

yebai commented Sep 15, 2025 •

edited

Loading

Red-Portal commented Sep 16, 2025 •

edited

Loading